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This paper analyzes an adaptive training algorithm for adjusting the 
tap weights of a tapped delay line filter to minimize mean-square inter- 
symbol interference for synchronous data transmission. The significant 
feature of the adjustment procedure is that convergence is guaranteed for 
all channel response pidses, even for very severe amplitude and phase dis- 
tortion. 

The author examines convergence, rate of convergence, and the effect 
of noisy observations of the received pulses, and he shows that the noisy 
observations result in a random sequence of tap weight settings whose mean 
value converges to a suboptimal setting. The mean-square deviation of the 
tap weights from the suboptimal values is asymptotically bounded with a 
bound that can be made as small as desired by sufficiently reducing the 
speed of convergence. 

The suboptimality arising here results from the use of isolated test pulses 
for the training signal. However, a training scheme using pseudorandom 
sequences or the actual data signal does not suffer from the suboptimality 
effect. Hence, although of possible utility in other pulse shaping applications, 
the technique presented here appears to be primarily of value in providing a 
conceptual framework for the closely related but more practical techniques 
to be examined in the sequel to this paper to be published shortly. 

I. INTRODUCTION 

A common approach to data transmission is to code the amplitudes 
of successive pulses in a periodic pulse train with a discrete set of 
possible amplitude levels. The coded pulse train is then linearly- 
modulated, transmitted through the channel, demodulated, equalized, 
and synchronously sampled and quantized. As a result of dispersion 
of the pulse shape by the channel, the number of detectable amplitude 
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levels has very often been limited by intersymbol interference rather 
than by additive noise. 

In principle, if the channel is known precisely it is virtually always 
possible to design an equalizer that will make the intersymbol inter- 
ference (at the sampling instants) arbitrarily small. However, in 
practice a channel is random in the sense of being one of an ensemble 
of possible channels. Consequently, a fixed equalizer designed on av- 
erage channel characteristics may not adequately reduce intersymbol 
interference. An adaptive equalizer is then needed which can be 
"trained," with the guidance of a suitable training signal transmitted 
through the channel, to adjust its parameters to optimal values. If the 
channel is also time-varying, an adaptive equalizer operating in a 
tracking mode is needed which can update its parameter values by 
tracking the changing channel characteristics during the course of 
normal data transmission. In both cases the adaptation may be 
achieved by observing or estimating the error between actual and 
desired equalizer responses and using this error to estimate the di- 
rection in which the parameters should be changed to approach the 
optimal values. 

A simple and effective technique for adaptive equalization was de- 
veloped by Lucky using the tapped delay line filter structure for 
the equalizer. 1 ' 2 The main limitation of this technique is that con- 
vergence of the tap weight adjustment algorithm is assured only for 
relatively low dispersion channels. The convergence condition re- 
quires that the dispersed pulse shape have adequate quality so that, 
in the absence of noise, error-free binary data transmission would be 
possible without equalization. In other words the dispersed pulse must 
have an open binary "eye." 

Using an approach to adaptation 3, 4 with virtually unrestricted con- 
vergence properties, Lucky and Rudin subsequently proposed and 
implemented an adaptive equalizer for minimizing the mean square 
error in frequency response of an analog channel. 5, 6 This approach 
was applied to synchronous data transmission by the author and in- 
dependently by Lytle and by Niessen. 7_0 An implementation of the 
technique was described by Niessen and Drouilhet. 10 It has also been 
implemented for data communication at Bell Laboratories. 

In this paper the approach is used for synchronous data transmis- 
sion in a training mode where a sequence of isolated pulses is used 
as a test signal. The technique may be viewed equally as an adaptive 
design procedure for a sampled-data pulse shaping filter where the 
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error criterion is to minimize the mean square error between actual 
and desired pulse shapes at the filter output. The important feature 
of the technique is that convergence is achieved for any channel 
pulse response whatever, thereby including highly dispersed pulses 
for which even binary data transmission would be impossible with- 
out equalization. Of particular interest are: (i) the analogous optimal- 
ly condition to Lucky's zero forcing condition resulting with the 
change from a summed absolute error to a summed squared error 
criterion, 1 (ii) the manner in which noisy observations introduce ran- 
domness in the iterative corrections to the weights and the resulting 
stochastic convergence properties, (Hi) the possibility of applying the 
technique where isolated pulses applied to a filter must be used to 
adaptively adjust the filter for optimum pulse shaping (unrelated to 
equalization), and (iv) the conceptual framework for the more prac- 
tical adaptation techniques to be described in a sequel to this paper, 
planned for publication soon. 

Perhaps the earliest application of the tapped delay line or "trans- 
versal" filter to pulse shaping for data transmission was made by 
W. P. Boothroyd and E. M. Creamer. 11 Tufts and George have shown 
that under a mean-square error criterion the optimal receiver struc- 
ture includes a tapped-delay line filter with delay between taps equal 
to the symbol period. 12 ' 13 Aaron and Tufts have also shown that the 
same receiver structure is needed to minimize the average error prob- 
ability for binary data transmission. 14 

The basic approach to adaptive adjustment of a set of weights 
where a mean-square error criterion is used with a gradient search 
procedure was considered by Widrow and Hoff who noticed that no 
derivative computation is needed. 3 Narendra and McBride proposed 
a self-optimizing Wiener filter using a continuous-time gradient 
algorithm and a filter structure whose transfer function is a weighted 
sum of fixed functions. 4 Koford and Groner used a mean-square error 
criterion and a gradient learning algorithm to find an optimum set 
of weights for pattern classifying." Widrow described a general adap- 
tive filtering problem with the tapped delay line filter. 16 Coll and 
George discussed the performance of George's optimum equalizer and 
indicated a possible adaptive adjustment technique. 17 Lucky and Rudin 
were the first to apply the mean square error criterion with the 
gradient search procedure to the field of adaptive equalization. 5 - 6 
This paper expands on a short presentation given at an international 
symposium on information theory. 18 
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II. PERFORMANCE OBJECTIVES FOR EQUALIZATION 

The objective of equalization, viewed as a pulse shaping problem, 
is to adjust the parameters of the equalizer to a setting which mini- 
mizes a suitable measure of the error between actual and desired 
pulse shapes. For the usual synchronous data transmission applica- 
tion, the desired pulse shape is one with the Nyquist property that 
the sample values y k at the sampling instant kT are given by y k = 
8 kr where S kr is unity for k = r and zero for all other integers k. The 
criterion used by Lucky 1 is peak distortion,!), given by 

D = Z | Vu |/| Vr V 

kftr 

An alternate criterion of interest is the mean square distortion E, 
defined by 

e = Z vl/yl • 

k*T 

The physical interpretation of the peak distortion is that it is di- 
rectly related to eye opening and determines the error probability for 
a worst case message pattern. The mean square distortion has a dif- 
ferent interpretation. If the message pattern is such that the trans- 
mitted level for each time slot is statistically independent of the levels 
for other time slots, then the variance of the intersymbol interference 
in a given time slot is proportional to the mean square distortion. If 
the pulse shape has a large number of small sidelobes so that the 
intersymbol interference is normally distributed, then minimizing 
mean square distortion is equivalent to minimizing error rate. 

Closely related to the mean square distortion is the mean square 
error 

8 - Z (Z/* - d k f (1) 

k 

where d k is the desired pulse sample value at time instant kT. For the 
usual equalization problem where d k = 8 kr , the measure 8 has virtually 
the same interpretation as E; however, E is a normalized measure in- 
dependent of pulse amplitude while £ depends on both shape and ampli- 
tude. Optimization of the tapped delay line equalizer with respect to 
either criterion leads to equivalent results. 

III. FORMULATION 

Consider the transversal equalizer with N taps and tap spacing T 
equal to the symbol period. Let c k be the weight at the fcth tap for 
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fc = 0, 1, • ■ • , N—l so that the input output relation of the transversal 
filter at the sample times is 

Vn = Z c A ..r n _, = c'x n (2) 

A— 

where x k and y k denote the input and output pulse samples, respectively, 
at time instants kT, c = (c , e, , • • • , c v _,) is the tap weight vector, 
and x„ = (x n , .r„_! , ■ • • , .i'„_.v+i) is the sample memory state of the 
delay line at the time instant nT; the vectors c and x„ are to be regarded 
as column matrices, and the prime denotes the transpose. We assume 
that the input sequence x k has finite energy. Let e„ = y n — d n . Then 
from equation (1), using (2), the gradient of the error with respect to c 
may be written as 

VS = 2£e*X t . (3) 

k 

Therefore the optimality condition for minimum error VS = is 
equivalent to the requirement that the (deterministic) corss-correlation 
between the input sequence x k and output error sequence e k must have 
zeros for the N components with index values corresponding to the index 
values of the available tap weights. That is, 

^e(fc) = E *nX„- k = for k = 0,1,2, ■■■ ,N - 1. 

This condition has an interesting similarity to Lucky's condition 
which states that the peak distortion, D, is minimized when the error 
sequence e„ has zeros for the A r components with index values corre- 
sponding to the index values of the available tap weights. 1 An im- 
portant distinction is that Lucky's condition is generally not valid 
when the input pulse distortion D exceeds unity, while the mean 
square optimizing condition is valid for any input pulse with finite 
mean square distortion. 

Using equation (2), the gradient (3) can be expressed explicitly as 
a function of the tap weight vector c, namely: 

VP> - 2(Ac - g) (4) 

where 

A = E x « x " > an d S — E r ^» x n • 

Notice that A is symmetric and positive definite (see Appendix A). 
Setting equation (4) equal to zero yields the solution for the optimum 
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tap weight vector c*, 

c* = A-g. 

Using equation (2), the error expression given by equation (1) may 
be expressed in the convenient form: 

8(c) = 8(c*) + (c - c*)'A(c - c*) (5) 

which shows explicitly the simple quadratic nature of the error surface 
and the unique optimality of the minimizing weight vector c*. It can 
be shown that the residual error 8(c*) can be made as small as desired 
for all channels of practical interest by using a sufficiently large number, 
N, of taps. 10 

It is intuitively reasonable that successive corrections to the tap 
weight vector in the direction of steepest descent of the error surface 
should lead to the minimum error where c = c*. This is the idea of the 
well-known 20 gradient algorithm: 

c, + , = c, - JaV8(c*) (6) 

where a is a suitably small positive proportionality constant, c is 
arbitrary, and c< is the tap weight vector after the zth iteration. 

The significant feature of the gradient algorithm for our quadratic 
error surface (5) is that the gradient can be conveniently evaluated 
without knowledge of the error surface itself. We have seen from equa- 
tion (3) that the components of the gradient vector are values of the 
crosscorrelation between the input sequence and the output error 
sequence. This suggests the conceptually simple implementation 
where an isolated test pulse is transmitted through the channel and the 
requisite crosscorrelation values are formed by multiplying the de- 
layed input pulse with the error pulse, sampling, and summing (or 
averaging). The tap weights are then incremented according to (6), 
the old crosscorrelation values "dumped" and a new iteration is begun 
with the transmission of a new test pulse. 

The error pulse is formed by subtracting from the equalizer output 
pulse an "ideal" pulse whose sample values are the desired values d k ; 
the ideal pulse is locally generated at the appropriate time. The basic 
scheme is shown in Fig. 1. Naturally, the summation given by equation 
(3) cannot be performed over an infinite time interval. Suppose kT 
is a practical upper bound on the possible time duration of the input 
pulse, £T is the time interval between successive test pulses with £T>kT, 
£ and k as positive integers. Then if we include the effect of perturbing 
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Fig. 1 — Four tap training mode adaptive equalizer. 

receiver noise samples n, and z, at the equalizer input and output, 
respectively, the measured crosscorrelation vector <£>,• after the ith. 
iteration is given by: 



«&.• = Z) (x,_ l£ + n,)(e/-, f + 2,). 



(7) 



In the noiseless case the estimate 6, reduces to one-half the deterministic 
gradient, that is, |VS(c.) under the assumption that the pulse sequence 
Xi and desired sequence d t are virtually zero outside of the interval 



TV. CONVERGENCE PROPERTIES 

In the presence of noise the tap weight corrections contain undesired 
random components consisting of products of input and output noise 
samples and products of pulse and noise sample. As a result, the 
random tap weights no longer converge to the optimal values but 
instead approach some neighborhood of a suboptimal setting and then 
fluctuate randomly about this setting. The error between the optimal 
and suboptimal settings is small for low noise levels and decreases 
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with increasing signal-to-noise ratios. The size of the fluctuation neigh- 
borhood about the suboptimal setting is proportional to the noise level 
but can be made as small as desired by making the training time suf- 
ficiently long. 

Assume the noise samples ??,- have zero mean and finite variance a 2 . 
Define the vector n /c = (n k , rijc-i, ' " , n/c-x + 1) to be regarded as a col- 
umn matrix. Then the output noise samples of the equalizer are: 

2* = c'n, . (8) 

Define the matrix B = E(n k n' k ), where E(- • •) denotes the expected 
value. Notice that B is symmetric and positive semidefinite. 

To formulate the iterative equations describing the tap weight be- 
havior in the presence of noise, apply equations (2) and (8) to (7) to 
show how the gradient estimate depends on the tap weight vector: 

£. = 2 (*:-.•£ + n,)[(x,_,j + n,)'c,- - di- ti ]. 

i 

Hence 

0. = H.c- - g - v, , (9) 

where H, is the random symmetric matrix 

H t - 2 (*«-** + n<)(x,-,- £ + n,)' (10) 

j 

and 

v, = En^,.,, . (11) 

i 

Let G = 2?(H<)i the expected value of H,- . Then equation (10) yields 

a = A + kB. (12) 

which is positive definite since A is positive definite and B is positive 
semidefinite. 

It is convenient to examine the random variation of the tap weight 
vector c k about the suboptimal setting defined by 

e = a l g, (13) 

and let q,- = c, — c. From equation (12) it is evident that the suboptimal 
setting c approaches the optimal setting c* as the ratio of noise variance 
to input pulse sequence energy approaches zero. The iterative algorithm 
may be expressed in the form 

q, +1 = q,- - ooQi , (14) 
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0, = H,q, + h, (15) 

where 

h, = H,-c - g - v, . (16) 

Equations (14) and (15) constitute a system of first-order stochastic 
difference equations with a forcing function h, which is statistically de- 
pendent on the stochastic state matrix H, . We assume that the per- 
turbing noise samples in different iterations are uncorrelated, so that 
H, and h, are independent of H, and h, for i 9^ j. Notice that the ex- 
pected value of any function of H,- and h, is independent of i. Under 
these conditions it is proved in Appendix C that for suitably small values 
of a the mean value of the solution vector q, approaches zero as i — » °° 
and the sum of the variances of the components of q, is bounded with a 
bound that approaches zero as a approaches zero. Consequently the 
mean value of the tap weight vector converges to the suboptimal set- 
ting c while the actual tap weights fluctuate randomly about the con- 
verging mean values with a variability that can be made arbitrarily 
small. 

Notice from Appendix C that the norm of the mean solution vector 
(q), is reduced at least by the factor f, the spectral norm 20 of 7 — aft. 
Let p, and p N denote the minimum and maximum eigenvalues, respect- 
ively, of ft. Then 

f = min | 1 — api | , | 1 — ap N \ . (17) 

(For proof see p. 24 of Ref. 20.) 

Then for < a < 2/(p, + Px), we obtain f = 1 — otp, . Consequently, 
while decreasing a offers a smaller bound on variability of the tap weight 
vector, increasing a assures a stronger bound on convergence rate. For 
the training mode it is likely that speed of adaptation will be relatively 
unimportant so that a very small value of a could be used to approach a 
tap weight setting that is very close to the suboptimal setting. 

It is useful to obtain bounds on the eigenvalues of ft which can be 
determined without specific knowledge of the channel characteristics. 
If x(t) denotes the channel pulse response and n(t) the additive receiver 
noise so that the sampled values used earlier are given by x k = x(kT) 
and n k = n(kT), then the sampled spectrum X*(u) of x k is 

X*( a ) = Z x* tM = ~ S X<p - 2t/T) 

and the sampled spectral density S*(u) of n t is 

S*(o) = I».n, + *)e-'' u * r = | Z S(o> - 2tt/T) 
k J- k 
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where X(u>) is the Fourier transform of x(t) and £(«) is the spectral 
density of n(t). Let m and M denote the infimum and supremum, 
respectively, of | X*(«) | 2 + kS*{u>) so that 

m ^ | X*(o>) | 2 + kS*(o>) ^ M. (18) 

In all cases of practical interest M will be finite; furthermore generally m 
will be greater than zero. It is shown in Appendix B that each eigenvalue, 
Pi , of G will be bounded according to 

m ^ Pi ^ M. (19) 

To illustrate the use of this bound, notice from Appendix C that the 
condition for convergence of the mean tap weight vector to the subopti- 
mal solution is that a < 2/p N . Thus a sufficient condition is that 

a < 2/M. (20) 

Furthermore, the mean tap setting converges exponentially with the 
convergence factor f, given by equation (17). Hence it can be inferred 
that the choice of a which provides the strongest bound (least value of 
f) is a = 2/(/3 1 + p N ) yielding 

f p+ 1 

where p = pn/pi . Using the bounds given in (19) we obtain p ^ M/m, 
and so 

_ M - m , ^ 

Therefore, for the best choice of a, convergence of the mean proceeds at 
least at a rate given by the geometric factor (M — m)/(M + m). Thus 
useful information regarding the convergence speed can be determined 
without knowledge of the channel characteristics. 

v. CONCLUSION 

The degree of suboptimality of the tap weight setting reached by the 
training algorithm may or may not be consequential, depending on the 
application. In applications where multilevel pulse transmission with a 
large number of levels could be achieved with adequate equalization, 
the signal-to-noise ratio is necessarily very high and therefore the degree 
of suboptimality is not large. Even when the noise level is fairly sub- 
stantial the suboptimal setting may still be adequate if the error surface 
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given by 8(c) is "shallow" in a large neighborhood of the minimum. 
Then a fairly large departure of c from c* may correspond to a rela- 
tively small increase in mean-square error. Also, if training mode adapta- 
tion is used as a prelude to tracking mode adaptation, a fairly large 
degree of suboptimality may be a tolerable starting point for a tracking 
mode operation such as the one we plan to describe in a future paper. 

When the noise level is substantial the criterion for optimality used 
here becomes inadequate because it does not consider the effect of the 
equalizer on the receiver noise. The price of reducing intersymbol in- 
terference may be a sizable increase in noise level at the equalizer 
output. In our future paper the error criterion is modified to include 
noise with the result that the problem of suboptimality does not arise. 

The random fluctuation of the tap weights which prevents true con- 
vergence to the suboptimal setting can be eliminated by reducing the 
proportionality constant a in each iteration using a sequence of step 
sizes a k with the properties 

2 <*k = °° and 2 a * < °° • 
It may then be shown that the tap weight vector converges to the 
suboptimal solution with probability 1. The proof uses stochastic 
approximation theory and follows the lines taken by Tong and Liu 
who considered a training mode algorithm for low dispersion chan- 
nels. 21 However, this modification complicates the implementation 
somewhat and cannot be applied to the tracking mode adaptation 
problem. 

APPENDIX A 

Proof that A is Positive Definite 
The matrix A is defined by 

A = ± x k x' k . (22) 

t--00 

Consequently 

00 00 

c'Ac = 2 c'x t x£c = Sl/»- 

— 00 *~«o 

But the sequence y k is the convolution of the x k sequence with the finite 
tap weight sequence c k . Hence, using Parseval's equality, 



./Ac = f /"" |X*(«) I* | C(«) | 2 do», 



(23) 
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where 

* = 

Equation (23) shows immediately that c'Ac is nonnegative for all 
vectors c. Also, C(w) can have only isolated zeros and | X*(o>) | is square 
integrable since the input pulse has finite mean square distortion. It may 
then be inferred that c'Ac > unless c = 0, which proves that A is 
positive definite. 

APPENDIX B 

Bounds on the Eigenvalues of Q, 

Since B = E{o.^i' k ) the quadratic form c'Bc is the mean squared value 
of y 2 . of the response of the equalizer with weight vector c to the input 
noise n t . Consequently 

i r' T 

c'Bc = ;f- / S*(u) | C(o>) | 2 dw. (24) 

Combining equations (23) and (24) yields 

c'Gc = ~ I" (| Z*(co) | 2 + kS*(u)} I C(cS) | 2 dco. (25) 

Applying to equation (25) the bounds m and M given by equation (18) 
yields 

m c'c ^ c'ac ^ Mc'c. (26) 

Let c be the eigenvector of tt corresponding to eigenvalue p. Then 
Gc = PiC and equation (26) yields 

m ^ Pi ^ M (27) 

which provides a convenient bound for the largest and smallest eigen- 
values of <J. 

appendix c 

Convergence Proof 

To examine the convergence properties of the tap weight adjustment 
algorithm, it is convenient to define the norm of a random vector u as 

|| u || = [2?(u'u)] 1/2 , (28) 
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so that the squared norm of u is the sum of the second moments of the 
components of u. For a deterministic vector the norm reduces to the 
usual Euclidian norm. The norm of a deterministic matrix will denote the 
usual spectral norm. 20 

Theorem: Let H t be a sequence of random symmetric N X N matrices 
and h t a sequence of random N -tuple column vectors. Suppose H t and 
h t are stationary in k with H* and h k independent of H, and h,- fork ^ j. 
Assume h k has zero mean, and the elements oj H* and h* have finite vari- 
ance, EB. k = a, independent of k with a positive definite. Define the random 
vector sequence q* according to: 

q*+i - Q/. - <*<ek ( 29 ) 

where 

*» - H*fe + h k (30) 

fork = 0, 1, 2, • • • and q is an arbitrary deterministic vector. Then for a 
positive and sufficiently small, 

lim \\Eq k || = (31a) 

*-»oo 

and 

lim sup ||q» || £ 7(a) (31b) 

*-*oo 

with 7(a), fifwen tn (47), satisfying: 

lim 7(a) = 0. (32) 

Proof: Combining equations (29) and (30) yields 

q* + i = (I - «H*)q* - oh* . (33) 

Noting that q* is independent of H* , taking the expected value in 
equation (33), we find 

£(fe + i) = (I - ««)#(**)■ ( 34 ) 

It follows then that 

||Jf(fe)|| £?\\Eq \\ (35) 

where 

f = || 7 - afi || . (36) 
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Hence equation (31a) follows when f < 1, or equivalently, for 

< a < 2/p N (37) 

where p N is the largest eigenvalue of Ct. 
To prove equation (31b), observe that 

#(qk,q* + = E[q' k (I - aH k ) 2 q k ] - E[2 a q' k (I - aH k )h k ] + a 2 \\ h k || a 

(38) 
from equation (33). Noting again that q* is independent of H k , we have 
E[q' k (I - «H t ) 2 q t ] - E[q£E[{I - oH t ) 2 ]q fc | g y. || q, || 2 , (39) 
where 

M = ||£[(7-aH*) 2 ]||. (40) 

Also, using the Schwarz inequality, 

E[-q' k (I - aH A )h t ] = aq^CBJiO g a || qj || / 
where / = || E(R k h k ) ||. Using equation (35) we obtain 

-E[q k (I - aH k )h k ) g at" \\ E(q Q ) || /. (41) 

The bounds (39) and (41) may be applied to equation (38), yielding 

|| q* +1 || 2 ^ M || q* II 2 + « 2 / \\ E q Q \\ f + a 2 || h fc || 2 . (42) 

If we now define the bounding sequence of positive numbers Q k ac- 
cording to 

Qo=\\E q || 2 
and 

Q* + , = nQk + « 2 / || E(q ) || t* + « 2 ||hi II 3 , (43) 

then it follows from (42) that 

l|q*H 2 ^Q*- 
But the difference equation given by (43) has the asymptotic solution 

« a iifa*ir 



lim Q t = . 

*-» 1 - /* 



for £ < 1 and ft < 1. Then 



limsupllftlf g ^HM' . (44) 
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Notice that \\h k \\ is independent of fc by the hypothesis of station- 
arity. 
Since 



where 



we find that 



(/ - aK k ) 2 = (I - aa)* + a 2 E(Gl). 

G k = H, - a, (45) 



M ^ \\I -«Gt\\ 2 +a 2 \\E(G k )\\ m 

H ^ f 2 + ay 

where 7 = || GJ || . Furthermore for a < 2/(p, + p N ), we have f - 1 - 
api . Then, using (46), we see that 

2 2 



1 - n = 2a Pl + a 2 (pi + 7) 

We have therefore shown that for positive and sufficiently small a, 
equations (31b) and (32) are valid where 

w 2p, + a (pi + 7) 
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